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DATA-DRIVEN RATE-OPTIMAL SPECIFICATION TESTING IN 
REGRESSION MODELS 1 

By Emmanuel Guerre and Pascal Lavergne 

LSTA Paris 6 and University of Toulouse GREMAQ and INRA 

We propose new data-driven smooth tests for a parametric re- 
gression function. The smoothing parameter is selected through a 
new criterion that favors a large smoothing parameter under the null 
hypothesis. The resulting test is adaptive rate-optimal and consis- 
tent against Pitman local alternatives approaching the parametric 
model at a rate arbitrarily close to 1/y/n. Asymptotic critical values 
come from the standard normal distribution and the bootstrap can 
be used in small samples. A general formalization allows one to con- 
sider a large class of linear smoothing methods, which can be tailored 
for detection of additive alternatives. 

1. Introduction. Consider n observations (Yi, Xj) in R x W and the het- 
eroscedastic regression model with unknown mean m(-) and variance cr 2 (-), 

Y i = m(X i )+e i , E[ei|Xi]=0 and Var^X*] = a 2 {Xi). 

We want to test the hypothesis that the regression belongs to some para- 
metric family {fj,(-;9);9 E 0}, that is, 

(1.1) H :m{-) =I-l(-,9) for some G G. 

Tests of Hq are called lack-of-fit tests or specification tests. Based on smooth- 
ing techniques, many consistent tests of Hq have been proposed, the so-called 
smooth tests; see Hart (1997) for a review. A fundamental issue is the choice 
of the smoothing parameter. Since this is a model selection problem, Eu- 
bank and Hart (1992), Ledwina (1994), Hart [(1997), Chapter 7] and Aerts, 
Claeskens and Hart (1999, 2000), among others, have proposed use of crite- 
ria developed by Akaike (1973) and Schwarz (1978). However, these criteria 
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are tailored for estimation but not for testing purposes. Hence, they do not 
yield adaptive rate-optimal tests, that is, tests that detect alternatives of un- 
known smoothness approaching the null hypothesis at the fastest possible 
rate when the sample size grows; see Spokoiny (1996). 

Many adaptive rate-optimal specification tests are based on the maximum 
approach, which consists of choosing as a test statistic the maximum of 
Studentized statistics associated with a sequence of smoothing parameters. 
This approach is used for testing the white noise model with normal errors 
by Fan (1996) and for testing a linear regression model with normal errors by 
Fan and Huang (2001) and Baraud, Huet and Laurent (2003), who extend 
the maximum approach. Further work on the linear model includes Spokoiny 
(2001) under homoscedastic errors and Zhang (2003) under heteroscedastic 
errors. Finally, Horowitz and Spokoiny (2001) deal with the general case of 
a nonlinear model with heteroscedastic errors. 

We reconsider the model selection approach to propose a new test with 
some distinctive features. First, our data-driven choice of the smoothing pa- 
rameter relies on a specific criterion tailored for testing purposes. This yields 
an adaptive rate-optimal test. Second, the criterion favors a baseline statistic 
under the null hypothesis. This results in a simple asymptotic distribution 
for our statistic and in bounded critical values for our test. By contrast, in 
the maximum approach, critical values diverge and must practically be eval- 
uated by simulation for any sample size. The computational burden of this 
task can be heavy for a large sample size and a large number of statistics. 
Moreover, diverging critical values are expected to yield some loss of power 
compared to our test. In particular, from an asymptotic viewpoint, our test 
detects local Pitman alternatives converging to the null at a faster rate than 
the ones detected by a maximum test. In small samples, our simulations 
show that our test has better power than a maximum test against irregular 
alternatives. 

In our work we allow for a nonlinear parametric regression model with 
mutidimensional covariates, nonnormal errors and heteroscedasticity of un- 
known form. In Section 2 we describe the specific aspects of our testing pro- 
cedure. In Section 3 we detail the practical construction of the test statistic 
for three types of smoothing procedures. Then we give our assumptions and 
main results, which concern the null asymptotic behavior of the test, adap- 
tive rate-optimality, and detection of Pitman local alternatives. In Section 4 
we prove the validity of a bootstrap method and compare the small sample 
performances of our test with a maximum test through a simulation experi- 
ment. In Section 5 we extend our results to general linear smoothing meth- 
ods. Finally, we propose a test whose power against additive alternatives is 
not affected by the curse of dimensionality. Proofs are given in Section 6. 
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2. Description of the procedure. Consider a collection {T^,/i S 7i. n } of 
asymptotically centered statistics which measures the lack-of-fit of the null 
parametric model. The index h is a smoothing parameter, chosen in a dis- 
crete grid whose cardinality grows with the sample size n; see our examples in 
the next section. A maximum test rejects Hq when max/, 6 x n T^/vh > z™ ax , 
where Vh estimates the asymptotic null standard deviation of T^. A test in 
the spirit of Baraud, Huet and Laurent (2003) rejects the null if > VhZ a {h) 
for some h in TL n or, equivalently, if max/ ig 7^ n (T/ l / : u/ l — z a (h)) > 0, where the 
critical values are chosen to get an asymptotic a- level test, a difficult issue in 
practice. Setting z a (h) = z™ ax yields a maximum test. Because the number 
h increases with n, z™ ax diverges. 

On an informal basis, our approach favors a baseline statistic T^ with 
lowest variance among the T^. In practice, T/j can be designed to yield 
high power against parametric or regular alternatives that are of primary 
interest for the statistician. However, this statistic may not be powerful 
enough against nonparametric or irregular alternatives. We then propose to 
combine this baseline statistic with the other statistics in the following 
way. Let Vh h Q be some positive estimators of the asymptotic null standard 
deviation of Th — Th . We select h as 

h = arg max{f h - -/ n v hM } 

(2.1) heHn - 

= arg max{T h - T ho - j n v htho } where 7„ > 0. 

Our test is 

(2.2) Reject H when T~/v ho > z a , 

where z a is the quantile of order (1 — a) of a standard normal. 

The distinctive features of our approach are as follows. First, our criterion 
penalizes each statistic by a quantity proportional to its standard deviation, 
while the criteria reviewed in Hart (1997) use a larger penalty proportional 
to the variance. Second, the data-driven choice of the smoothing parameter 
favors ho under the null hypothesis. Indeed, since Th — Th is of order Vh t h 
under Hq, h = ho asymptotically under Hq if 7 n diverges fast enough; see 
Theorem 1 below. Hence, the null limit distribution of the test statistic is 
the one of Th /vh , that is, the standard normal, and the resulting test 
has bounded critical values. Third, our selection procedure allows us to 
choose the standardization dh - We could use instead, which also gives an 

asymptotic a-level test since h = ho asymptotically under Ho- But, because 
Vh > Vh asymptotically for any admissible h, our standardization gives a 
larger critical region under the alternative. This increases power at no cost 
from an asymptotic viewpoint; see Fan (1996) for a similar device in wavelet 
thresholding tests. Our simulation results show that this effect is already 
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large in small samples. By contrast, the maximum approach systematically 
downweights the statistic TX with its standard deviation. 

Third, compared to a test using a single statistic, our test inherits the 
power properties of each of the T^, up to a term j n Vh,h - Indeed, the defini- 
tion of h yields 

% = , m ? x @h - lnVh,h ) + InVr h >T h - J n v h ,h for an Y heU n - 

As a consequence, a lower bound for the power of the test is 

(2.3) P(f~>v ho z a )>F(f h >v ho z a + j n v hM ) for any h in H n . 

Using a penalty proportional to a standard deviation yields a better power 
bound than the selection criteria reviewed in Hart (1997). A suitable choice 
of the smoothing parameter in the latter power bound allows us to estab- 
lish the adaptive rate-optimality of the test; see Theorem 2 below and the 
following discussion. Fourth, combining the Th with our selection procedure 
gives a more powerful test than using the baseline statistic Th ■ Indeed, since 
Vh 0t h =0, a noteworthy implication of (2.3) is 

(2.4) P(T~ > v ho z a ) > ¥{f h0 > v ho z a ). 

Theorem 3 below uses the latter inequality to study detection of Pitman 
local alternatives approaching the null at a faster rate than in Horowitz and 
Spokoiny (2001). 

3. Main results. For any integer q and any x S M q , \x\ = maxi<£< 9 \xi\. 
For real deterministic sequences, a n x b n means that a n and b n have the 
same exact order, that is, there is a C > 1 with 1/(7 < a n /b n < C for n large 
enough. For real random variables, A n xpB n means that P(l/(7 < A n /B n < 
C) goes to 1 when n grows. In such statements, uniformity with respect 
to a variable means that C can be chosen independently of it. A sequence 
{m n (-)} n >i is equicontinuous if, for any e > 0, there is an r\ > such that 
sup n>1 \m n (x) — m n (x')\ < e for all x, x' with \x — x'\ < rj. 

3.1. Construction of the statistics and assumptions. Let 6 n be the non- 
linear least-squares estimator of 9 in model (1.1), that is, 

n 

(3.1) ^argmin^TO^-^A^)) 2 , 

1=1 

with an appropriate convention in case of ties. A typical statistic 7), is an 
estimator of the mean-squared distance of the regression function from the 
parametric model 

n 

(3.2) min£(m„(X 4 )-MA 4 ;#)) 2 . 
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From the estimated parametric residuals f7, = Yi — /i(Xi]6 n ) = m(Xi) — 
fj,(Xi;9 n ) +Si, i = 1, . . . ,n, we can estimate the departure from the paramet- 
ric regression using a leave-one-out linear nonparametric estimator 8h{Xi) = 
Y?j=ij^iVij{h)Uj based on some weights Vij{h) with smoothing parameter 
h. Then (3.2) can be estimated as 

(3.3) T h = jrUMXi)= E U j2M±M^ fj i U j = U'W h U, 

i=l 1<*7^J<" 

where U = [U\, . . . , U n ]' and the generic element of Wh is Wij(h) = (fij(h) + 
fji(h))/2 for i^j and wu(h) = 0. Such a Th is asymptotically normal un- 
der Hq\ see, for example, de Jong (1987). Examples la and lb come from 
projection methods, while Example 2 builds on kernel smoothing. 

Example 1a (Regression on multivariate polynomial functions). Let 
^k{x) = U P £=iX k /, for k G W with \k\ = max z=lj ... iP k x < 1/h. Let ^ h = [i/> k M, \ k \ < 
l/h,i = 1, . . . ,n] and Ph = ^ h(^'h^ h) ^'h be the n x n orthogonal projec- 
tion matrix onto the linear subspace of M n spanned by . The matrix Wh 
is obtained from Ph by setting its diagonal elements to zero. 

Example 1b (Regression on piecewise polynomial functions). Under 
the assumption that the support of X is [0, l] p , we consider piecewise poly- 
nomial functions of fixed order q over bins Ik{h) = Y\// = i\kph,(kp + l)h), 
k = (ki, . . . , k p ), kg = 0, . . . , (1/h) — 1. These functions write 

v 

^ q kh(x) = l[xfl(x€l k {h)), 

e=i 

< M = max Qo < q, 1 < \k\ = max kp < 1/h. 

1<£< P ~' 1 1<£< P ' 

The particular choice q = corresponds to the regressogram. The matrix 
Wh is constructed as in Example la. 

Example 2 (Kernel smoothing). Consider a continuous, nonnegative, 
symmetric and bounded kernel K(-) from W that integrates to 1 and has 
a positive integrable Fourier transform. These conditions hold for prod- 
ucts of the triangular, normal, Laplace or Cauchy kernels. Define Kh(x) = 
K(x\ /h, . . . , x p / h) . We consider 

Tu= E * U, 

with f h (X t ) = 1 E Kh&j ~ X-). 
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We now turn to variance estimation. The leave-one-out construction of 
the T/j gives that the asymptotic conditional variances v\ and v\ ho of Th 

and Th — Th under Hq are 

v 2 h = 2 J2 u^a^X^iX,), 
C3 4 \ l<i,i<n 

v 2 hM = 2 J2 K-W-uJyCMjVTOa 2 ^-). 

l<i,j<n 

For our main examples, 



v 



l ~Fh p and u^xph p - h p ; 



see Proposition 2 in Section 6. Let <r 2 (-) be a nonparametric estimator of 
a 2 (-) such that 



(3.5) max 

Ki<n 



op(l) 



for any equicontinuous sequence of regression functions. For instance, let 



Y.UK^-X^Kbn) 
{ ' [EUYjldXj-XilKbn) 



V E"=ll(l^-^i|<6n) J 

where b n is a bandwidth parameter chosen independently of 7i n such that 
n l—i/d diverges; see Proposition 3 in Section 6. Consistent estimators of 
the variances in (3.4) are 

l<i,j<n 
l<i,j<n 

Finally, for the sake of parsimony, and following Horowitz and Spokoiny 
(2001), Lepski, Mammen and Spokoiny (1997) and Spokoiny (2001), the 
set 7i n of admissible smoothing parameters is a geometric grid of J n + 1 
smoothing parameters, 

(3.7) TL n = {hj = /ioa~ J , j = 0, . . . , J n } for some a > 1, J n — > +oo. 

Note that ho can depend on an empirical measure of the dispersion of the 
Xi, as in Zhang (2003), and can converge to zero very slowly, say, as 1/lnn. 
We assume the following: 

Assumption D. The i.i.d. Xi £ [0, l] p have a strictly positive continuous 
density over [0, l] p . 
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Assumption M. The function (i(x; 9) is continuous with respect to x in 
[0, l] p and 9 in G, where O is a compact subset of M. d . There is a constant fx 
such that for all 9, 9' in 6 and for all x in [0, l] p , 9) - n(x; 0')\ < fi\6 - 9'\ . 

Assumption E. The Si are independent given X\,...,X n . For each i, 
the distribution of £j, given the design, depends only on Xj, E[ej|Xj] = and 
Var[£j|Xj] = a 2 (X{), where the unknown variance function <r 2 (-) is continu- 
ous and bounded away from 0. For some d' > max(d,4), E^'de^'lXi] < Ci 
for all i. 

Assumption W. (i) For any h, the matrix Wh is one from Example la, 
lb or 2. (ii) The set H n is as in (3.7) with hj n x (Innf^Pn- 2 ^ 4 ^ , for 
some C-2 > 1, with s = 5p/ '4 in Example la and s = p/4 in Examples lb and 
2. The number a is an integer for Example lb. 

Under Assumption M, the value of the parameter 9 may not be identi- 
fied, as in mixture or multiple index models. The restriction on hj n , together 
with the definition of 7i. n , implies that the number J n + 1 of smoothing pa- 
rameters is of order Inn at most. Assumption W(i), which considers specific 
nonparametric methods, will be relaxed in Section 5.1, allowing us, in par- 
ticular, to consider a baseline statistic Th designed for specific parametric 
alternatives. 

3.2. Limit behavior of the test under the null hypothesis. The next the- 
orem allows for a penalty sequence j n of exact order V21n Inn, as J n is of 
order Inn. 

Theorem 1. Consider a sequence {n(-,9 n ),9 n E &} n >i in Hq. Let As- 
sumptions D, M, E and W hold and assume that the variance estimator 
satisfies (3.5). If ho — ► and 7 n — > oo with 

(3.8) 7„ > (1 + rj) a/2 In J n for some n>0, 

the test (2.2) has level a asymptotically given the design, that is, 

F{f l >z Q v ho \X 1 ,...,X n )^a. 

Theorem 1 is proved in two main steps. The first step consists in showing 
that 

(3.9) F(h^h )=p( max Th ~ Th ° > -f n ) 

\heHn\{h } v hM J 

goes to zero. This is done by first proving that (T/j — T^/v^ho asymp- 
totically behaves at first-order as e'iWh — Wh )£/vh,h uniformly for h in 
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T~(-n \ {ho}, where e = [ei, . . . ,e n ]', and second by bounding the distribution 
tails of max/jg-ft^j/^} e'(Wh — Wh )s/vh,h - Then we show that the limit dis- 
tribution of Th /vh is that of e'Wh s/vh , which converges to a standard 
normal when ho goes to 0. 

As done by Horowitz and Spokoiny (2001), Theorem 1 imposes that ho 
asymptotically vanishes. This condition yields a pivotal limit distribution 
for our test statistic. As shown by Hart [(1997), page 220] under stronger 
regularity conditions on the parametric model, considering a fixed ho gen- 
erally yields a nonpivotal limit distribution because the estimation error 
/j,(-;6 n ) — fi(-;6) cannot be neglected. Hart (1997) then recommends the use 
of a double bootstrap procedure to estimate the critical values of the test. 

3.3. Consistency of the test. Theorem 2 below considers general alter- 
natives with unknown smoothness. Theorem 3 considers Pitman local al- 
ternatives. For any real s, let [s\ be the lower integer part of s, that is, 
L-sJ < s < [s\ + 1. Let the Holder class C P (L, s) be the set of maps m(-) from 
[0, If to R with 

C P (L, s) = {m(-); \m(x) — m(y)\ < L\x — y\ s for all x, y in [0, l] p } 

for s6 (0,1], 

C p (L,s) = {m(-); the [sj th partial derivatives of m(-) are in C p (L,s — [s\)} 

for s > 1. 



Theorem 2. Consider a sequence of equicontinuous regression functions 
{mn{-)}n>i such that for some unknown s > s and L > 0, m n (-) — n(-;0) G 
C p (L,s) for all 8 in Q and all n. Let Assumptions D, M, E and W hold. 
Assume that the variance estimator satisfies (3.5), that l/(Cblnn) <ho< 
Cq for some Co > and that j n < n 1 for some 7 in (0, 1). If 



(3.10) 



mm 

flee 



1 



,1/2 



2s/(4s+p) 



i=l J 

> (i + 0p (i)KL^+rt ( 7wSup - g[0 ' 1]pCj2 

V n 

the test (2.2) is consistent given the design, that is, 

n?~>v ho z a \x 1 ,...,x n )^i, 

provided ki = k±(s) > is large enough. 



The proof is based upon the power bound (2.3). From this inequality, the 
test is consistent if Th — z a Vh — ^ n Vh,h diverges in probability for a suitable 
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choice of the smoothing parameter h adapted to the unknown smoothness of 
the departure from the parametric model. Thus, combining several statistics 
in the procedure is crucial to detecting alternatives of unknown smoothness. 
A sketch of the proof is as follows. For a departure from the parametric model 
in C p (L,s), Th estimates ming e e YA=i( m n{Xi) — M-^i! ^)) 2 up to a multi- 
plicative constant with a bias of order nL 2 h 2s . The standard deviation of 
is of order h~ p l 2 and the order of Vh z a +JnVh,h is ^ n h~ p ^ 2 sup a . g [ 01 ] P a 2 (x). 
Collecting the leading terms shows that Th — Vh z a — "f n Vh,h diverges if 

j n "I 1/2 

n f— * 

is of larger order than 

-(nL 2 h 2s + ln h~ p/2 sup a 2 (x) 
- n \ xe[o,i]p 

Finding the minimum of this quantity with respect to h gives the rate of 
(3.10). The rate of the optimal h is (j n ™-f xe [ Ql i]p a 2 {x) / L 2 n) 2 ^ 4s+P "> . The 
parsimonious set TC n is rich enough to contain an h of this order. Our proof 
can be easily modified to study the selection procedures considered in Hart 
(1997), which use 7n^h m (2-1) instead of "f n Vh,h - This would give the worst 
detection rate ( / y n /n ) s ^ 2s+ p \ 

For 7 n of order V In Inn, the smallest order compatible with Theorem 1, 
the test detects alternatives (3.10) with rate (Vln In n /n) 2s / ( 4s +p) for any 
s > s. This rate is the optimal adaptive minimax one for the idealistic white 
noise model; see Spokoiny (1996). Horowitz and Spokoiny (2001) obtain the 
same rate for their kernel-based test but with minimal smoothness index 
s = max(2,p/4), while we achieve s = p/A for our piecewise polynomial or 
kernel-based tests. The value p/4 is critical for the smoothness index s, as 
previously noted by Guerre and Lavergne (2002) and Baraud, Huet and 
Laurent (2003). 



min 




Theorem 3. Let 9q be an inner point of and consider a sequence of 
local alternatives m n (-) = n(-;6o) + r n 5 n {-), where {S n (-)} n >i is an equicon- 
tinuous sequence from C P (L, s) for some unknown s > s and L > 0, with 

(3.11) -J2€(X l ) = l + OF (l) and -J^W) 9 ^ 9 ^ =op(1). 

n f— • n ^— ' uu 

i=i i=i 

Assume that for each x in [0, /j>(x;0) is twice differentiable with respect 
to 9 in with second-order derivatives continuous in x and 9 and that, for 
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some C3 > 0, 

(c 3 + op(i))\e -e'\ 2 

(3J2) <-V(^ifl)-/i^;»')) 2 for any 9,9' tn@. 



n . 



Lei Assumptions D, M, E and W /10W and assume that the variance es- 



timator satisfies (3.5). If ho — > 0, r n — ► and \jnh^ 2 r n — ► 00, i/ie iesi is 
consistent given the design. 

The rate r n of Theorem 3 can be made arbitrarily close to 1/y/n by a 
proper choice of This improves upon Horowitz and Spokoiny (2001), 
who obtain the rate \/ln Inn/ -^/n. 

As stated in Lemma 5 of Section 6, conditions (3.11) and the identification 
condition (3.12) ensure that 



(3.13) min 



1 n 

-y2(m n (Xi)-fi(Xi;0)f 
n 

i=i 



1/2 

= r n -op(r n ). 



As the minimum of (3.13) is achieved for 6 = 6q at first-order, r n 5 n (-) 
is asymptotically the departure from /z(-;#o)- When r n converges to zero, 
this departure becomes smoother as it belongs to the smoothness class 
C p (Lr n ,s). This sharply contrasts with the departures from the paramet- 
ric model in Theorem 2, which can be much more irregular. The proof of 
Theorem 3 follows from (2.4). The test is consistent as soon as 7\ — Vh z a 
diverges in probability. We show that is, up to a multiplicative constant, 
an estimate of r^iLi^n^) with a negligible bias and a standard devia- 
tion of order h p ^ 2 . As is of order h p ^ 2 , T^ Q — Vh z a diverges to infinity 
as soon as nr^ diverges faster than h p ^ 2 as required. 

4. Bootstrap implementation and small sample behavior. 

4.1. Bootstrap critical values. The wild bootstrap, initially proposed by 
Wu (1986), is often used in smooth lack-of-fit tests to compute small sam- 
ple critical values; see, for example, Hardle and Mammen (1993). Here 
we use a generalization of this method, the smooth conditional moments 
bootstrap introduced by Gozalo (1997). It consists of drawing n i.i.d. ran- 
dom variables Ui independently from the original sample with Ecjj = 0, 
Kujf = 1 and E|u;j| a! < 00, and generating bootstrap observations of Y{ as 
Y* = fi(Xi,6 n ) + a n (Xi)LUi,i = 1, . . . , n. A bootstrap test statistic T^/uf lQ is 
built from the bootstrap sample, as was the original test statistic. When this 
scheme is repeated many times, the bootstrap critical value n at level a is 
the empirical 1 — a quantile of the bootstrapped test statistics. This critical 
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value is then compared to the initial test statistic. The following theorem 
establishes the first-order consistency of this procedure. 

Theorem 4. Let Yi = m n (Xi) + e%, i = 1, . . . ,n, be the initial model, 
where {m n (-)} n >i is any equicontinuous sequence of functions. Under the 
assumptions of Theorem 1 and for the variance estimator a^(Xi) of (3.6), 

sup|P(%/^ < z\X u Y u . . .,X n ,Y n ) - F(N(0, 1) < z)\ ^ 0. 

4.2. Small sample behavior. We investigated the small sample behavior 
of our bootstrap test. We generated samples of 150 observations through the 
model 

(4.1) Y = 9 1 + 9 2 X + rcos(2irtX) +e, r E {o, •y/f \,t E {2,5, 10}, 

where X is distributed as U[— 1, 1]. The null hypothesis corresponds to r = 
0, while under the alternatives r 2 = 2/3 and E[r 2 cos 2 (2vrtX)]/Ee 2 = 1/3 
for any integer t, a quite small signal-to-noise ratio. When t increases, the 
deviation from the linear model becomes more oscillating and irregular, and 
then more difficult to detect. 

To compute our test statistic, we used the regressogram method of Ex- 
ample lb with half-binwidths in 

7i n = {h = 2- 2 M = 2~ 3 , ...,h 5 = 2~ 7 }. 

The smallest binwidth thus defines 128 cells, which is sufficient for 150 obser- 



vations. The 7„ was set to cy/2 In J n , where c = 1, 1.5, 2. For each experiment 
we ran 5000 replications under the null and 1000 under the alternative. For 
each replication the bootstrap critical values were computed from 199 boot- 
strap samples. For U{ we used the two-point distribution 

l-V / 5\ 5 + ^ m f l + V^A 



10 ' V 2 7 10 

which verifies the required conditions. 

In a first stage we set (61,62) = (0, 0) and performed a test for white noise, 
that is, Hq : m(-) =0, with homoscedastic errors following a standard normal 
distribution (Table 1). We estimated the variance under homoscedasticity 
by 

^ n— 1 
d n = 2 (n-l) ? ( ~ y W ^ ' 



where Y^ denote observations ordered according to the order of the X{ . This 
estimate is consistent under the null and the alternative; see Rice (1984). 
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In each cell of the tables, the first and second rows give empirical percent- 
ages of rejections at 2% and 5% nominal levels. We compare our test to 
(i) simple benchmark tests based on fixed bandwidths ho and h§, to evalu- 
ate the effect of a data-driven bandwidth, (ii) the maximum test based on 
Max = max/jgT-^ T^/vh, to evaluate the gain of our approach and (iii) a test 
based on Tr/vz, to evaluate the effect of our standardization. For each test, 
we computed bootstrap critical values as for our test. 

Under the null hypothesis, the bootstrap leads to accurate rejection prob- 
abilities for all tests. Under the considered alternatives, empirical power de- 
creases for all tests when the frequency increases from t = 2 to t = 10. The 
data-driven tests always dominate the tests based on the fixed parameter ho , 
which behaves poorly. For the low frequency alternatives, data-driven tests 
perform very well with power greater than 90% and 95% at a 2% and 5% 
nominal level, respectively, and there are no significant differences between 
them. For higher frequency alternatives, differences are significant. Our test 
has quite high power and rejects the null hypothesis at more than 85% and 
60% at a 5% level when t = 5 and 10, respectively. It performs better than 
or as well as does the test based on h^ designed for irregular alternatives, ex- 
cept for c = 2 and t = 10. It always dominates Max with differences ranging 
from 7.1% to 18.3%, depending on the level. The test based on Tjjv^ be- 
haves as the Max test. This suggests that the high performances of our test 
are mainly explained by our standardization choice, which is made possible 
by our selection procedure. 

To check whether these conclusions are affected by the details of the exper- 
iments, we consider errors following a centered and standardized exponential 



Table 1 

White noise model — Gaussian errors 





"ho 




Max 




v h 






Our test 




c = 1 


c — 1.5 


c = 2 


c= 1 


c — 1.5 


c = 2 




1.9 


2.1 


2.0 


2.0 


2.0 


2.0 


1.8 


1.8 


1.7 




5.3 


5.1 


4.2 


4.3 


4.2 


4.4 


4.4 


4.3 


4.4 


t = 2 


r,.l 


60.6 


90.5 


90.7 


90.0 


90.5 


91.7 


91.3 


91.9 




9.0 


72.5 


96.0 


96.3 


95.9 


96.2 


95.4 


95.7 


97.3 


t = 5 


3.0 


59.2 


66.3 


66.9 


66.3 


66.3 


77.3 


78.5 


78.8 




7.7 


73.3 


79.2 


79.8 


79.4 


79.5 


88.7 


88.5 


87.8 


t=W 


3.4 


50.5 


32.8 


32.5 


32.5 


32.7 


48.4 


49.2 


49.2 




7.0 


66.0 


49.3 


50.2 


49.3 


48.8 


65.6 


65.5 


59.9 



Percentages of rejection at 2% and 5% nominal levels. 
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(Table 2), a standardized Student with five degrees of freedom (Table 3), a 
normal distribution with conditional variance cr 2 (X) = (l + 3A" 2 )/3 using our 
estimator (3.6) with b n = 1/8 (Table 4) and a linear model with homoscedas- 
tic normal errors and (#i,#2) = (1>3) (Table 5). As results for T^/v^ are very 
similar to the ones for Max, we do not report them. For exponential errors, 
there is a slight tendency to overrejection. It is likely that matching third- 
order moments in the bootstrap sample generation as proposed by Gozalo 
(1997) would lead to more accurate critical values. Heteroscedasticity does 
not adversely affect the behavior of the tests. For the linear model, there is 
some gain in power for the Max test compared with Table 1, but differences 
with our test remain significant for the two high-frequency alternatives. 

5. Extensions to general nonparametric methods and additive alterna- 
tives. 

5.1. General nonparametric methods. We give here some general suffi- 
cient conditions ensuring the validity of our results. These conditions could 
be checked for other smoothing methods or other designs than the ones con- 
sidered here. Indeed, different smoothing methods can be used for specifica- 
tion testing; see, for example, Chen (1994) for spline smoothing, Fan, Zhang 
and Zhang (2001) for local polynomials and Spokoiny (1996) for wavelets. 
Also, our conditions allow for various constructions of the quadratic forms 
Th\ see, for example, Dette (1999) and Hardle and Mammen (1993). 

For an n x n matrix W, let Sp n [W] be its spectral radius and A^[W] = 
Tr [WW] =J2i,j w ij- For W symmetric, the former is its largest eigenvalue 
in absolute value and the latter is the sum of its squared eigenvalues. 



Table 2 

White noise model — exponential errors 









Max 


c— 1 


Our test 
c = 1.5 


c— 2 




2.9 


2.9 


3.3 


3.3 


3.2 


3.4 




6.1 


6.2 


6.7 


6.3 


5.9 


6.5 


t = 2 


4.5 


65.4 


91.9 


92.2 


92.4 


92.6 




9.0 


77.7 


95.9 


96.1 


96.3 


97.2 


t = 5 


5.6 


61.4 


66.5 


76.7 


77.0 


78.6 




9.6 


71.7 


78.9 


86.1 


87.0 


86.0 


t=10 


3.6 


50.6 


35.4 


51.3 


52.8 


53.7 




7.6 


64.5 


52.3 


65.5 


65.6 


62.0 



Percentages of rejection at 2% and 5% nominal levels. 
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Table 3 

White noise model — Student errors 



Our test 





■»h 




Max 


c— 1 


c = 1.5 


c = 2 


Ho 


2.3 


2.1 


2.0 


1.8 


1.7 


1.9 




5.0 


4.8 


4.4 


4.5 


4.3 


4.4 


t = 2 


5.2 


60.4 


91.8 


91.9 


92.2 


92.1 




9.2 


73.3 


95.7 


95.5 


95.8 


96.2 


t = 5 


3.4 


60.6 


66.6 


77.6 


77.7 


79.0 




8.4 


74.6 


79.3 


88.2 


88.2 


86.9 


t=10 


3.6 


48.8 


32.2 


48.1 


48.5 


49.4 




7.8 


65.1 


48.1 


63.1 


64.2 


60.0 



Percentages of rejection at 2% and 5% nominal levels. 



Assumption W0. Let H n be as in (3.7) with h Jn x (lnn)^/? /n 2 /( 4 *+P) 
for some s > 0, C2 > 1 and ho — > 0. The collection of n x n matrices {Wfi,, /i £ 
W n } is such that: (i) For all h, Wh = [wij(h), 1 < i,j < n] depends only upon 
Xi,..., X n and is real symmetric with wu(h) = for all i. (ii) max/ lg ^ n Sp n [Wft] = 
Op(1). (iii) Nl\W h ] x P for all h G H„ and uniformly in /i G W n \{/i }A"2[M/ fe - 
W h0 ] x P /t-f - /t/. 

Assumption Wl. Let H n , s and h Jn be as in Assumption W0. For any 
sequence h n = hj n from TL n : (i) There are some symmetric positive semidef- 
inite matrices i\ n with Sp n [W/ ljI — Ph n ] =op(l). (ii) For any s > s, there is 



Table 4 

White noise model — heteroscedastic errors 

Our test 









Max 


c = 1 


c = 1.5 


c = 2 


H 


2.2 


2.2 


1.8 


1.7 


1.5 


1.6 




5.1 


5.0 


4.7 


4.2 


4.1 


4.2 


t = 2 


3.0 


62.3 


92.6 


94.1 


93.9 


94.9 




5.9 


76.3 


98.0 


97.9 


98.4 


98.7 


t = 5 


1.6 


64.4 


62.9 


82.9 


83.5 


83.9 




4.2 


78.9 


81.9 


91.9 


92.8 


91.6 


t = 10 


2.2 


57.8 


26.8 


53.3 


53.7 


53.2 




5.6 


72.8 


50.3 


69.5 


71.3 


63.5 



Percentages of rejection at 2% and 5% nominal levels. 
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Table 5 
Linear model — Gaussian errors 



Our test 





■»h 


Vh J n 


Max 


c— 1 


c = 1.5 


c— 2 


Ho 


2.3 


2.1 


1.9 


1.9 


2.0 


2.0 




5.0 


5.0 


4.4 


4.5 


4.5 


5.0 


t = 2 


3.0 


59.8 


93.6 


91.0 


91.2 


91.1 




6.3 


71.7 


96.7 


95.5 


95.6 


96.8 


t = 5 


2.7 


58.2 


73.2 


77.7 


77.9 


78.5 




5.8 


72.7 


85.0 


88.4 


88.2 


88.4 


t=10 


3.0 


48.2 


41.9 


50.4 


50.6 


50.0 




7.0 


64.4 


58.8 


66.0 


66.2 


61.8 



Percentages of rejection at 2% and 5% nominal levels. 



a set n<, n of functions from [0, l] p to M such that for any L > and any <5(-) 
in C p (L,s), there is a it(-) in IT Sin with sup xg [ ^ P \5(x) — tt(x)\ < C^Lh s n for 

some C 4 = C 4 (s) > 0. (iii) Let A 2 n = A 2 n (s, h~) = inf, e n s ,„ Ei< 8J <„ *&i)Pij(K)K{Xj) /Y2=i ^(Xi) 2 , 
where Pij(h n ) is the generic element of Ph n . For any s > s, there is a constant 
C 5 = C 5 (s) > such that P(A n > C 5 ) -> 1. 

Assumption Wl describes the approximation properties of the nonpara- 
metric method used to build the Wh and allows us to extend a result of 
Ingster [(1993), page 253 and following]; see Lemma 6 in Section 6. The 
next proposition shows that our main examples satisfy Assumptions W0 
and Wl under a regular i.i.d. random design. 

Proposition 1. Assume that Assumption D holds, and let s be as in 
Assumption W. Then Examples la, lb and 2 satisfy Assumptions W0 and 
Wl. 

The next theorem extends our main results under Assumptions W0 and 
Wl. In Section 6 we actually show Theorems 1-4 by proving Theorem 5 and 
Proposition 1. 

Theorem 5. Theorems 1 and 4 hold under Assumption W0 in place 
of Assumptions D and W. Theorems 2 and 3 hold under Assumptions W0 
and Wl in place of Assumptions D and W. 

5.2. Additive alternatives. Our general framework easily adapts to de- 
tection of specific alternatives. We focus here on additive nonparametric 
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regressions m(x) = mi(x\) + • • • + m p (x p ). The null hypothesis is 

Hq : m(-) = //(•; 9) for some 9 G 0, 
where fi{x;9) = /ii(xi;6>) H V n p (x p ;9). 

For ease of notation, we consider a modification of Example la where we 
remove cross-products of polynomial functions. Let X{ = [Xu, . . . ,X p i]' and 
consider the (p/h) xn matrix ^ = [X^ , X^ ,i = l,...,n,k = 0,...,l/h]. 
Let Wh be the matrix obtained from ^ ^i^'h^ h) ^'h by setting the diagonal 
entries to and Th defined as in (3.3). 

Theorem 6. Let the matrices Wh be as above and 7i n be as in (3.7) ; 
with hj n x (lnn)^/?! 1 / 3 for some Cg > 1. Let Assumptions D, E and M 
hold. Consider a sequence of additive equicontinuous regression functions 
{^n(')}n>i an d assume that the variance estimator satisfies (3.5). 

(i) For ho and 7 n as in Theorem 1, the test is asymptotically of level a 
given the design. 

(ii) Assume that for some unknown s > 5/4 and L > 0, m n (-) — fi(-;9) is 
in C p (L,s) for all 9 in 8 and all n. For ho and j n as in Theorem 2 and 

1/2 

■J2(m n (X l )-^X i ;9)Y 



mm 

6»ee 



n : 



>(l + o P (l))^^/('-«»( T " SUP ^ 1 '' T2 ^ ) )" /(4 ' + ". 
the test is consistent given the design provided K2 = ^(s) ^ s l ar 9 e enough. 

Proof of Theorem 6 repeats the proofs of Theorems 1 and 2 with v\ h(j 

of order (h^ 1 — h^ 1 ) instead of (h~ p — /iq P ) and is therefore omitted. One 
can also show consistency of the test against Pitman additive alternatives 

j Y?2 

that approach the parametric model at rate o(l/ynh ). The bootstrap 
procedure described in Section 4.1 also remains valid. 



6. Proofs. This section is organized as follows. In Section 6.1 we study 
the quadratic forms e'(Wh — Wh )e and e'Wh£ under Hq. Section 6.2 re- 
calls some results related to variance estimation. In Section 6.3 we gather 
preliminary results on the parametric estimation error m n (-) — fi(-;9 n ). In 
Sections 6.4 and 6.5 we establish Theorems 1 and 4 under Assumption WO. 
In Sections 6.6 and 6.7 we establish Theorems 2 and 3 under Assumptions 
WO and Wl. Thus, Theorem 5 is a direct consequence of Sections 6.4-6.7. 
Section 6.8 deals with Proposition 1. 



DATA-DRIVEN TESTS FOR REGRESSION MODELS 



17 



We denote Y = [Yi, . . . ,Y n ]' and e = [ei, . . . ,e n ]' . For any S(-) from W 
to R, 5 = <5(X) = [5(A"i), . . . , <5(X„)]' and D n (8) is the n x n diagonal matrix 
with entries 8{Xi). Let || • ||^ and (v)n De the Euclidean norm and inner 
product on W 1 divided by n, respectively, that is, 

1 " 

||C = IIWIIn = -E^W 

2=1 

and 

1 n 

M) n = MPO)„ = -E^(^)- 

This gives Sp n [W] = max|| u || n=1 ||Wit|| n = maxij u || n=1 \u'Wu\/n for a sym- 
metric W. Recall that Sp n [AB] < Sp n L4]Sp n [-B]. Let 9 n = Q n , m be such that 

(6.1) min \\m(X) - 6)\\ n = \\m{X) - n )|| n . 

We use the notation F n (A) for P(A|Xi, . . .,X n ), E n [-] and Var n [-] being the 
associated conditional mean and variance. In what follows, C and C are 
positive constants that may vary from line to line. An absolute constant 
depends neither on the design nor on the distribution of the £j given the 
design. 



6.1. Study of quadratic forms. The proof of Lemma 1 is omitted. 



Lemma 1. Let W be an nxn symmetric matrix with zeros on the diago- 
nal. Under Assumption E, E n [e'M^e] =0 and\ai n [e'W e] = 2Y^i<i,j<n w ij a2 (Xi) 
<?{Xj) = 2N*[D n (o-)WD n (a)] x N%[W}. 



Lemma 2. Let a = in.^ x e[o,i]p °~{ x ) > 0, a = sup xg r 0jl ] P o~(x) < oo and v £ 
(0,1/2). Under Assumption E, there is an absolute constant C = C u > 
such that: 



(i) If(* 4 Sp 2 n [W h ])/(a*NZ[W h ])<v 



swp\F n (e'W h e < v h z) 



W,i)< 2 )l<cf^l 



1/4 



<LN n [W h ] J 



(ii) ForallheHn\{ho} andanyz>0, if(a 4 Sp 2 n [W h -W ho })/{a 4 N^[W h - 
W h0 ])<v, 



e'(W h - W ho )e 



Vh,h 



> z < 



V2 



7TZ 



exp -— +C 



aS Pw [^-^,] y/ 4 
aN n [W h -W ho ] 
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Proof. Let 2 = D n 1 (a)e, so that E n [ej] = and Var n [ej] = 1 for all 
i, and let W = Kj]i<i,j<n be D n (a)W h D n (a) or D n (a)(W h - W ho )D n (a), 
so that for v 2 = N 2 [W] = Ei<i,j<n w ij, e'We/v is e'W h e/v h or e'{W h - 
Wh )s/vh,h Q -, respectively. Let Ai, . . . , A n be the real eigenvalues of W, 

n / n \ 3/2 n n 



c -1 



Wij\ 



i=i \j=i 



--ij=i 



1 



and A n = — A 



i=l 



Consider a vector g of n independent iV(0, 1) variables, independent of the 
Xi. Theorem 3 of Rotar' and Shervashidze (1985) says that there is an 
absolute constant C > such that 



sup 



2We 



< z 



g'Wg 



< z 



< C [l - ln(l - 2A n )] 3 / 4 £y 4 if A n < 1/2. 

Let {bi S M n }i<j< n be an orthonormal system of eigenvectors of W as- 
sociated with the eigenvalues Aj. As E n [</W<7] = by Lemma 1, g'Wg = 
Y2=x Wig? = T2=x hMg) 2 -E n [(^) 2 ]]. Hence, g'Wg has the same con- 
ditional distribution as 2i=i^iCi> where the d are centered Chi-squared 
variables with one degree of freedom, independent among themselves and of 
the Xi. The Berry-Esseen bound of Chow and Teicher [(1988), Theorem 3, 
page 304] yields that there is an absolute constant C > such that 

'g'Wg 



sup 



< z 



F(N(0,l)<z) 



< Q 2^j=\ \ A i\ 



The two above inequalities together imply that if A n < 1/2, 
'e'We 



(6.2) 



sup 

zeK 



< z 



P(JV(0,1) <z) 



<C 



(l-ln(l-2A n )) 3/4 £y 4 + 



Let {ei,i = l,...,n} be the canonical basis of R n , so that 
Then 

n / n \ 3/2 

EE' 

i=i \j=i 



1/v^. 



<4 



Ell^ e i||n iitj/ 1 1 2 
— ^i" 11 — n\\Wei" 



-» lln 



i=l 



E 

l<i,j'<n 



<S Pn [^]x £ «,2 = 
l<i,j'<n 

2 |(ei,Wej) n | 



Sp n [W]iV n 2 [^], 



E 

l<i ,j<n 



W 



\e-i 



1 1 1 n 1 1 °j 1 1 n 



< 



E 



l<i,j<n 



-3 ||n 
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Hence, using v 2 = Ya=i = N%[W] and | A»| < Sp n [W] for all i, we obtain 



Sp 2 n [W] 



C n <A2 



N n [W] 



and 



^|A,J 3 . Sp w [Wl < / Sp w [W] \ 1 / 4 



E 

1=1 



< 



N n [W] -\N n [W]J 



since Sp n [W] /-/VnfW] < 1 for any symmetric W. The above inequalities and 
(6.2) give 



(6.3) 



sup 



e'We 



< z^j -P(JV(0,1) < z) 



<C 



SPn[^] \ V4 

N n [W] 



provided (Sp n [W]/-/V n [W]) 2 < u, for an absolute constant C = C v > 0. 
Part (i) follows by setting W = D n (a)WhD n (a) in (6.3) and noting that 



N n [W]J -\aj \N n [W h ] 

Part (ii) follows from (6.3) with W = D n (a)(W h - W ho )D n (a) and Mills' 
ratio inequality. □ 



6.2. Variance estimation. The following results are proven in Guerre and 
Lavergne (2003). 

Proposition 2. Under Assumptions D and W, xp h^ p and uni- 
formly in he H n \ {h } vl M x P h~ p - h^ . 

Proposition 3. Let {m n (-)} n >i be an equicontinuous sequence of re- 
gression functions. 

(i) Under Assumptions D and E, if b n — ► and n 1-4 / rf 6^ — > oo, then 
(3.5) holds. 

(ii) Let {Wh, h £ TC n } be any collection of nonzero nxn symmetric matri- 

IT p , 

ces with zeros on the diagonal. Under (3.5), — ► 1 and max^^u^i | a ' — 
1|=op(1). 



6.3. The parametric estimation error. 



Lemma 3. Let W be an nxn symmetric matrix depending upon X\, . . . , X 
, be as in (6.1) and B n {R) = {9 E 9; \ £f =1 (/ipQ;0) " °n)) 2 < R 2 }- 
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Under Assumptions E and M, there is an absolute constant C = Cd> > 
such that, for any m n (-), any n and any R > 0, 



E, 



sup \V^(W(p(X;9)-p(X;9 n )),e) n \ 
0eB n {R) 

<C»Sp n [W]RmaxE 1 r / d '[\e i \ d '}. 

Ki<n 



Proof. Without loss of generality, we can assume that maxi<j< n E 1 /^' 
(i = Sp n [W] = 1. Let 5 w (-;9) = W{p{-;9) - p(-,0 n )). The Marcinkie- wicz- 
Zygmund inequality, see Chow and Teicher (1988), yields, under Assumption 
E and for any 9,9' in O, that there is an absolute constant C such that 



- 7 =Y,( 5 w(X i ;e)-5 w {X i ;e'))e i 



i=l 



<C 



1 n 

-^2(S w (X i ;9)-d w (X i ;9')) 2 E 2 n / d '\s i f 



i=i 



1/2 



< C\\W(p(X; 9) - ^X; 9')) \\ n < C||/x(X; 9) - M (X; 8')\\ n . 

Let M n {t, R) be the smallest number of 9) — fi(X; 0')|| n -balls of radius t 

covering B n (R). It follows from van der Vaart [(1998), Example 19.7] and 
Assumption M that, for some absolute constant C > 0, M n {t, R) < C'(R/t) d . 
The Holder inequality and Corollary 2.2.5 from van der Vaart and Wellner 
(1996) give, as d/d' < 1, 



E„ sup 

eeB n {R) 



1 



—j= ^ Sw (Xj ; fl)g. 



i=l 



<¥}J d ' 



sup 

0eB n (R) 



1 



i=l 



rR/J}\d/d' 



□ 



Lemma 4. Under Assumptions E and M, i/iere is an absolute constant 
C = Cd' > 0, such that, for any p large enough, any m n { ) and any n, 



\m n (X)-p(X;9 n )\\ n >V3\\m n (X)-p(X;9 n )\\ n + 



V2p 



< 



Cmax 1<i<n E 1 n /d [\ei\ d '\ 
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PROOF. The definition (3.1) of 9 n yields, see van de Geer (2000), 
\\m n {X) - ii(X-6 n )\\l 



< 20u(X; 9 n ) - fj,(X; n ),e) n + \\m n (X) - //(X; 9 n )\\ 2 nl 

< 4(//(X; 9 n ) - fJ,(X; n ),e) n + 4||m n (X) - /x(X; n ™ 



(6 - 4) |KX;0 n )-/x(X;0 n )||2 



n ' 

2 



Consider a fixed r > 1 and any p>r. Let £ n = {||m n (X) — fi(X;9 n )\\ n < 
(n(X; n ) — n(X; 9 n ),e) n }, so that on the complement of this event ||m n (X) — 
fi(X; 6 n )\\ n < \f3\\m n {X) — fi(X ; 9 n )\\ n by (6.4). Lemma 4 follows by bound- 
ing 

F n ^V3\\m n (X)-fi{X;e n )\\ n + ^^j < \\m n (X) - fx(X; 9 n )\\ 2 n and £, 

< P n 2\\m n (X) - (i(X; 6 n )\\ 2 n + 

\ n 

< 2\\m n {X) - fi{X; e n )f n + 2\\fi(X;9 n ) - fi(X; 9 n )\\ 2 n and £ n 

,2.1 



\ < MX; 6 n ) - fi(X; e n )\\ 2 n and . 



Let 5, = S jtn = {9e &;ri/rfi<\\ii(X;e)-n(X;O n )\\ n < r^ 1 /^} C B n {r^ +1 /^i) 
with B n {-) as in Lemma 3. Then (6.4), the definition of £ n , the Markov in- 
equality and Lemma 3 with W = Id n yield 



< ||//(X;0 n )-/ipr;0 n )||2 and£, 



n 

(~ r 2j 
<E P «K gS ; ^d — <(fi(X;9 n )-fi(X;9 n ),e) 



hoo , r 2j 



<E P «7T^ su P I V^(M(^; - 0„),e) n | 



+oo o / — 



sup |v^(/x(A';0)-/i(A';0 n ),e) n | 



<CmaxEy d '[| e 4 d ']y 

KKn f— ' 



+ 00 o+l , 



3=J 

r 2 Cm^ 1<i<n ^ n ,d '[\e i \ d '] 



r — 1 r 



J ■ □ 
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Lemma 5 is proven in Guerre and Lavergne (2003). 

Lemma 5. Consider the local alternatives of Theorem 3 and let the con- 
ditions of Theorem 3 on hold. Under Assumptions E and M and if 
lim n ^ +00 y/nr n = +oo, 



\m n (X)-p(X;9 r , 



r n -op(r n ) and \\p(X;t 



■ M (X; 



>0) h 



Proposition 4. [/nder Assumptions E, M and W0(ii) ; i//i — > 0, i/ien, 
for any {m n (-)} n >i C H , 



max 
h£H n \{ho} 



f h -f ho -e\W h -W ho )e 



•(1), 



^ /2 (T h0 -^ e)= Op (l). 



(h-p - ho p y/ 2 

Let h n G 7i n be an arbitrary sequence of smoothing parameters. Then under 
H or Hi, 

(m n (X) - n(X, e n ))'W h e = O p (1) [v^||m n (X) - M (X, n ) ||„ + 1] . 



(6.5) 



Proof. We have 

f h = (m n (X) - p(X-9 n ))'W h (m n (X) - fi{X;9 n )) 
+ 2(m n (X) - 9 n ))'W h e + e'W h e. 

The Cauchy-Schwarz inequality, Assumptions E and W0(ii) and Lemma 4 
yield uniformly in h G 

| (m n (X) - 0„)) '^K(X) - /x(X; 9 n ))\ 
< n max S Pn [^]||m n (A) - 

= Op[(l + v^||m„(X) - /i(X; e n )\\ n ) 2 ) = Op(1) 

under i/ch as ||m n (X) — //(X; n )|| n = 0. Since for any h G H n , h~ p — h p > 
h^ p — Kq P = /iQ P (a p — 1) — > +oo, we obtain that, under Hq, 



max 

h£H n \{h } 



(m n (X) - p(X; 9 n ))'{W h - W ho )(m n {X) - fx(X; 9 n )) 



h P /2 (m n (X) - ii(X; 9 n ))'W ho (m n (X) - fi(X; 9 n )) = o P (l). 
(6.6) 

Since \\n(X;0 n )-ii(X;9 n )\\ n < MX;9 n )-m n (X)\\ n + \\m n (X)- p,(X;9 n )\\ n , 
Lemma 4 and Assumption E yield f n (9 n ^ B PjV ) < C/p for any p large 
enough, any m n {-) and any n, where 



(h-p- V) 1/2 



op(l), 



0GG; 



|HX;0) - /z(*A)lln < (Vs + l)\\m n (X) - n{X-9 n )\\n + 
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(6.7) 



sup \{p{X,9)-p{X-6 n ))'We\ 
< CpSp n [W](yfti\\m n (X) - n(X; 9 n )\\ n + 1) 



Taking W = Wh and using the Markov inequality, (6.5), (6.6), m n (X) 
p(X;9 n ) = 0, Assumption W0(ii) and ho — > then show that h^ 2 (Th 
e'Wh e) = op(l) under H . Taking W = Wh — Wh in (6.7) and using h 
h^a" 3 for some j = 0, . . . , J n yields, under Hq, 



max 
hen„\{h } 



(p(X,9 n )-KX;9 n ))'(W h -W ho )e 



> e 



< 



p,n ) 



1 



sup 

6 heH n \{h } eeB p< 

c , p 
e 



(h-p - h p y/ 2 

(p(X,9) - p(X;9 n ))'(W h -W ho )e 



(h-p - ho p y/ 2 



1 



c + p -o r {hl'% 



for all e > 0. The last result follows from (6.7) with W = and 

E n [((m n (X)-^(X;a„))W) 2 ]<nSp2(^0a 2 ||m n (X)-/i(X;^)||2. □ 

6.4. Proof of Theorem 1 under Assumption WO. Under Assumptions 
WO(iii) and E, v hM X N n [W h - W ho ] x P {h~P - h^) 1 ' 2 uniformly in h G 
T~Ln \ {ho}; see Lemma 1. Therefore, Propositions 3(H) and 4 yield 



max 

hen n \{h Q } 



T h -T, 



ho 



v h,h 

Let rj be as in (3.8). Observe that 



(1 + op(1)) x max 

hen n \{h } 



e'(W h -W ho )e 



Vh,h 



+ Qp(1). 



< 



max 

hen n \{h } 



max 

h&H n \{h } 



Th — Th 



> If 

v h,h 

s'(W h -W h0 )e 



Vh,h 



> 



In 



1 + V/2 



+ Qp(l). 



Applying Lemma 2(H) using Assumption WO (Hi) and hj = hoa 3 for j 
0, . . . , J n , we obtain 



h£H n \{h } 



e'(W h - W ho )e 



Vh,h 



> 



1+77/2 



+ op(l) 
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V2(l + V /2) ( If 7 „ 



+00 -. 

+ °M /8 ) £ {apj _ -Ql/8 + «*(!) = <*(!)> 

using (3.8), /i ->• and 7„ ->• 00. Thus, P n (^ - ^o z a) = ^n(T ho > v ho z a ) + 
op(l). Theorem 1 then follows from Propositions 3(h) and 4, Lemma 2(i) 
and Assumption WO. 

6.5. Proof of Theorem 4 under Assumptions D and WO. Let e* = [e\, . . . , £^\. 
We hrst establish a moment bound that plays the role of Assumption E. As 
e* = a n (Xi)u)i, where the u>i are independent of the initial sample, E[|e*| rf \X±, Y\, . . . , X n ,Y n ] 
E[|w 1 | d ']|a n (X i )| d ' and 



(6.8) maxE[\e* i \ d \X 1 ,Y 1 ,...,X n ,Y n ]<E[\u 1 \ d ][ sup a d (x) + o P (l) 

l<«<n W[0,1]p 

This is sufficient to establish Theorem 4; see Guerre and Lavergne (2003). 

6.6. Proof of Theorem 2 under Assumptions WO and Wl. 

Lemma 6. Consider a function 5(-) 6 C p (L,s) with s> s and L>0. 
Consider any sequence h n from Ji n and let A n = A n (s, h n ) be as in Assump- 
tion Wl(iii). Under Assumption Wl, we have 

5{X)'W h J{X) 

> n[(A n - Spl/ 2 [W hn - i^J)lfe)lln - (An. + S P y 2 [^J)C 4 L^] 2 , 
where C4 = 6*4(5) is from Assumption Wl(ii), provided 

1/2 

(6.9) |fe)|| n > A " + S P" [PfeJ C 4 L< > 0. 

A n -s P y 2 [^ n -p ft j 

Proof. We have W n ?= J+?W„ - JfcJ? > ? , P /ln 5-nS Pn [W hn - 
Ph n ] \\o~\\n - Let 7r(-) be such that sup^p ^ P |<5(x) —ir(x)\ < C^Lh s n \ see Assumption Wl(ii). 
Because P^ n is positive by Assumption Wl(i), the triangle inequality and 
the definition of A n yield 



n J \ n J \n 



>(^^) 1/2 -S P ^[P hn ]\\5-n\\ n 



n 
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> AJS + vr - S\\ n - Spl/ 2 [P hn ] ||? - vr|| n , 

> A n ,||?||„-(A n + S P y 2 [^ 



7T 



As (A„-S p y 2 [W h 



>A n \\5\\ n -(A n + Sp 1 r / 2 [P hn ])C 4 Lh s n . 
- PhJWWn ~ (A n + Spl/ 2 [P h J)C 4 Lh n > from (6.9), 
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- (A n + Spy 2 ^])^^] 2 - Sp n [W hn - P hn 
[(A n - S p y 2 [^ n - P hn ])Mn - (A„ + Spl/ 2 [P hn ])C 4 Lh 

x[(A„+s P y 2 [^„-p, n 



>[(A n -$ v l J 2 [W hn -P hn 



-(A n + S Pn ^[P hn ])C,Lh n ] 
(A n + S p y 2 [P ;i J)C4L^] 2 . □ 



We now prove Theorem 2 under Assumptions WO and Wl, using the 
power bound (2.3). Take h n = h^a' 3 ™ , where j n is the integer part of 



1 



In 



L 2 n 



lna[As+p \7ninf xe [ ,i]P<T 2 (a;) 



+ In h 



In 



L 2 n 



hiaAs + p V7ninf xg [ 0i i]pO- 2 (a 

using ln/io = O(lnlnn) and ln(n/7 n ) > (1 — 7) Inn for some 7 E (0, 1). Note 
that h n is in 7i n for all s > s and L > since h Jn X (lnn) C2 / p /n 2 /^ +p ) for 
some C2 > 1 and 7„ < n" ( for some 7 G (0, 1). We have 



Lhi x Lf/( 4s+ P) f ^ 
n 



2s/(As+p) 



and 



nL 2 ^ x 7n a 2 /^/ 2 x L 2 ^( 4s+ ^(V7 2 7 n ) 4s /( 4s+ P)^/( 4s+ P) - 00. 



Take now 5(-) = m n (-) — (j,(-;Q n ) in Lemma 6, which belongs to C p (L,s) by 
the assumptions of Theorem 2. The lower bound (3.10) of Theorem 2 yields 

||£pOHn > \\m n (X) - fi(X;6 n )\\ n > C Kl Lh n (l + op(l)), 

implying, in particular, that n\\m n (X) — fi(X;6 n )\\ n diverges in probability. 
Under Assumptions W0(ii) and Wl(i), (hi), 



c Ki lk > ^h n ) + $y l ' 2 [p hn } ^ o 

A n {s,h n )-S Vn /2 {W hn -P hn ] 



1 
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for k\ large enough, showing that <5(-) verifies (6.9) with probability tending 
to 1. Therefore, Lemma 6 and Assumption Wl(iii) yield 

(m n (X) - fi(X;9 n ))'W hn (m n {X) - fi(X;9 n )) 
= 8'(X)W h J(X) 

> n[(A n - Spl/ 2 [W hn - P hn ])\\m n (X) - /x(X; 6 n )\\ n 

- (A n + S P y 2 [P,J)C 4 L^] 2 (l + o P (l)) 

> C{1 + o F (l))n\\m n (X) - fi(X; 9 n )\\ 2 n > C(l + o ¥ (l))nK 2 L 2 h 2 n s . 
Moreover, by Proposition 4, 

(m n (X) - v(X; e n ))'W hn e = Op(^||m n (X) - fi(X; 9 n )\\ n ) 

= o ¥ (n\\m n (X) - fi(X;9 n )\\ 2 n ). 

From e'W hn e = ¥ (v hn ) = P (hn P/2 ) = o P (nL 2 h 2 n s ) and (6.5), 

f hn > C(l + o v {l))n\\m n {X) - fi(X; 9 n )\\ 2 n > C(l + o F (l))nK 2 L 2 h 2 n s . 

Proposition 3(h), Lemma 1 and Assumption WO(iii) yield z a Vh +^nVh n ,h X P 

In^hnM) X P lrL^ 2 hn P ^ 2 x nLp'h 2 ^ . Collecting the leading terms implies that, 
for K\ large enough, 

T hn - z a v ho - "fnVhnM ^ CnL 2 h n s (nl - C')(l + o P (l)) +oo. 

6.7. Proof of Theorem 3 under Assumptions WO and Wl. The proof 
follows the lines of the proof of Theorem 2, using now (2.4). Since m n (X) — 
(X) + /i(X; 9 ) - fi(X; 9 n ), 

(m n (X) - fi(X; 9 n ))'W ho (m n (X) - /x(X; 9 n )) 
= r 2 J n (X)'W ho 5 n {X) 

+ 2r n 5 n {X)W hQ {n{X-M -KX-A)) 

+ {»(X; 9 ) - n(X; 9 n ))'W ho {^(X; 9 ) - fi(X; 9 n )). 

By Lemma 5, 

(x)w h MX;0o)-KX;9 n ))\ 

<nr n Sp n [^ o ]||5 n (A)|| n ||MA;0o)-M^;^)lln = op(nr2), 
| (ji{X; 9 ) - n{X\ 9 n ))'W ho ( M (X; 9 ) - fi(X; 9 n ))\ 
< nSp n [W ho ]\\»(X;9 ) - »{X;9 n )\\l = o ¥ (nr 2 n ). 
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Because {5 n (-)} n >i C C(L,s) with s > s, Lemma 6 yields, under (3.11) and 
h ^0, 

5 n {X)'W h MX) > (1 + o P (l))n[(A n - Spl/ 2 [W h0 - P ho ])\\5 n (X)\\ n 

-^(An + Spy 2 ^])^] 2 

>Cn(l + op(l)). 

Equation (6.5) in the proof of Proposition 4 and Lemma 5 give, since z a Vh + 
e'Wh s = Op(h ~ p ^ 2 ), nrffij 2 — > +oo and h — > 0, 

^k ~ z «^o - lnVh ,h + op(l))Cnr£ + O p (/iq p/2 ) +oo. 

6.8. Proof of Proposition 1. We only detail the case of Examples la and 
lb. The proof of Proposition 1 for Example 2 can be found in Guerre and 
Lavergne (2003). 

The functions ipk(') can be changed into any system generating the same 
linear subspace of W 1 : Consider the following orthonormal basis of l] p , dx): 

p 

I y/2k e + lQk e (xt)I(x G [0, If) for Example la, 

e=i 

p 

hr v ' 2 [] V2h + iQ qe (k e h - x e )i(x e h{h)) 
1=1 

for Example lb, 

where the Qk(-) are the Legendre polynomials of degree k on [0,1], with 

sup t6[0 ,i] |Q fc (t)| < 1, Jo Qlit) dt = V(2fc + 1), Jo' Qk(t)Q k '(t) dt = for k + 
k'; see, for example, Davis (1975). Let <&h = [cj>k(X), 1 < \k\ < 1/h] for Exam- 
ple la and ^ = [<fi q kh{X), 1 < |<?| < q, 1 < \k\ < 1/h] for Example lb. Define 
dh as the number of columns of &h and note that in both examples dh is of 
order h~ p . 

Lemma 7. If /(■) is bounded away from and infinity on [0,l] p , i/tere 
is a C > stzc/i i/iai 

maxSp^Kn- 1 ^)- 1 ]^^ 

fttrtn 

and 

max Sp dh [n -1 3>^3>/j] < C u>ii/i probability tending to 1, 

provided h~j V = o{nj 'Inn) 1 / 3 in Example la and /ij P = o(n/lnn) in Example 
lb. 



(6.10) 



4>k(x) 



<f>qkh(x) 
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Proof. Consider first Example la. As the n h £ Tt n , are nested 

Gram matrices, it is sufficient to consider the spectral radii of n~ 1 ^' hj ^h Jn 
and its inverse. We have 

p 



\MXr)<f>k>{Xi)\ < J] v / 2^+l V /2^ + 1 < Chjf, 
1=1 

< sup \</> k (x)\ sup |^(^)|E 1/2 ^(^i)E 1/2 4'(^) 

xe[0,l]P x6[0,l]P 

< Ch~j p , 

as E(p1(X) < sup^g^ ^p /(x) / </>f (x) dx = sup^gjo^p f{x). The Bernstein in- 
equality then yields 



lnh p T 

— sup 

mrl 0<|fc|,|fc'|<l/h Jn 



iy> fc (Xi)0*(XO-E0jfe(X)M*) 



Op(l). 



This gives n = n X E<1>1 +i?h 7 , where Ru, is a <4, x 

"Jn "Jn ftj n "Jn "Jn ' "Jn "Jn 



<i/j Jn matrix whose elements are uniformly Of{Jhm/nh P j n ). Thus 



Sp, h/n [RkJ < N dhjn [R hj J =oJ J LJ^)= 0P (1), 



as /ij^ = o(n/ Inn) 1 / 3 . Hence, the eigenvalues of n~ l <&' hj &h Jn are between 
the smallest and largest eigenvalues of n^ 1 E^' hj &h Jn , with probability tend- 
ing to one. But, for any a S M. dhj ^ , 



rrVESl, $ ftr a = E 

"Jn "Jn 



E «^-Po) 

'0<|fc|<l/fc Jn / 

/ E a k4>k(x)) dx = a't 

J ^ P \o<\k\<i/ hj „ J 



:|*I<V»j b 

since the ^fc(-) are orthonormal in L2QO, dx). Therefore, the eigenvalues 
of the symmetric matrix n~ 1 EQ' hj &h Jn ar e bounded away from and in- 
finity when n grows. Example lb is studied in Baraud (2002) and follows 
from similar arguments. □ 

We now return to the proof of Proposition 1 for Example 1. Lemma 7 
implies that, for some C > 1, 
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with probability tending to 1, where -< is the ordering of symmetric matrices. 
Because pu(h) = e^Phei, where {ej}i<j< n is the canonical basis of R n , this 
gives 

C 
n 



(6.11) \pu(h)\<{ 



0l( x i) < C/(nh 2p ), for Example la, 

\k\<l/h 

<P 2 gkh( x i) < C/(nh p ), for Example lb, 
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\k\<l/h,q<q 



with probability going to 1 and uniformly in i = 1, . . . , n and h E TL n . Indeed, 
01(0 < Ch~ p for all k < 1/h for Example la, while <p^ kh (Xi) vanishes except 
for exactly one index k with ^^(Aj) < Ch~ p for Example lb. 

To prove Assumption WO(ii), note that Sp n [-f\] = 1 since Ph is an or- 
thogonal projection. The triangular inequality gives max/ lg -^ n Sp n [W/ l ] < 
1 + max/j g -ft n maxi<j< n \pu(h)\ = Op(l) by (6.11) and the restriction on hj n 
which gives hj 2p = o{n) for Example la and hj P = o(n) for Example lb. For 
Assumption WO(iii), we have 

K[W h ] = N 2 n [P h ] - N 2 n [W h - P h ], 

N 2 [W h - W h0 ] = N 2 [P h - P h0 ] - N 2 [(W h - P h ) - (W hQ - P h0 )\. 

Now N%[P h ] = Rank[P ft ] and N%[P h - P ho ] = Rank[P h - P ho ], since P h and 
Ph — Ph are orthogonal projections. This gives N%[Ph] x h~ p and N%[Ph — 
Ph ] h~ p — Hq P almost surely for Example la, and for Example lb, using 
the Bernstein inequality with h~j P = o(n/lnn), ensuring that the number 
of Xi in each bin h{h) diverges. Then, since N%[Wh — Ph] = Y,i=iPii{h), 
Assumption WO(iii) holds if 

n 

max h p J2Pii( h ) = Qj»(l) 

1=1 

and 

n 

max (hT p - Kq P Y 1 \\{pu{h) - pn{h )) 2 = o P (l), 

h£H„\h i=1 

which is a consequence of (6.11), together with h^ 3p = o{n/mn) for Exam- 
ple la and hj P = o(n/lnn) for Example lb. To show Assumption Wl(i), note 
that the Ph are symmetric positive semidefinite with maxh<=n n Sp n [Wh — 
Ph] = op(l)j as shown when establishing Assumption WO(ii). For Assump- 
tion Wl(ii), (iii), consider first Example la. Let U Sj h be the set of poly- 
nomial functions with order 1/h which are such that Assumption Wl(ii) 
holds by the multivariate Jackson theorem; see, for example, Lorentz (1966). 
This choice of H s h gives = 1 almost surely by definition of the Ph with 
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h~j P = o(n) and Assumption D. For Example lb, the proof of Assumtion 
Wl(ii) uses the same Taylor expansion as in Guerre and Lavergne (2002) to 
build the IL s ,h- Assumption Wl(iii), for any given q, is a consequence of As- 
sumption Wl(iii) for q=l. This can be shown using Guerre and Lavergne 
(2002) and establishing convergence of local empirical moments with re- 
peated applications of the Bernstein inequality. 
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